Probabilistic Characterization of Decision Trees Probabilistic Characterization of Decision Trees

نویسندگان

  • Amit Dhurandhar
  • Alin Dobra
  • Leslie Pack Kaelbling
چکیده

In this paper we use the methodology introduced in Dhurandhar and Dobra (2006) for analyzing the error of classifiers and the model selection measures, to analyze decision tree algorithms. The methodology consists of obtaining parametric expressions for the moments of the Generalization error (GE) for the classification model of interest, followed by plotting these expressions for interpritability. The major challenge in applying the methodology to decision trees, the main theme of this work, is customizing the generic expressions for the moments of GE to this particular classification algorithm. The specific contributions we make in this paper are: (a) we completely characterize a subclass of decision trees namely, Random decision trees, (b) we discuss how the analysis extends to other decision tree algorithms, and (c) in order to extend the analysis to certain model selection measures, we generalize the relationships between the moments of GE and moments of the model selection measures given in Dhurandhar and Dobra (2006) to randomized classification algorithms. An extensive empirical comparison between the proposed method and Monte Carlo, depicts the advantages of the method in terms of running time and accuracy. It also showcases the use of the method as an exploratory tool to study learning algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic analysis of the asymmetric digital search trees

In this paper, by applying three functional operators the previous results on the (Poisson) variance of the external profile in digital search trees will be improved. We study the profile built over $n$ binary strings generated by a memoryless source with unequal probabilities of symbols and use a combinatorial approach for studying the Poissonized variance, since the probability distribution o...

متن کامل

A Theory of Probabilistic Boosting, Decision Trees and Matryoshki

We present a theory of boosting probabilistic classifiers. We place ourselves in the situation of a user who only provides a stopping parameter and a probabilistic weak learner/classifier and compare three types of boosting algorithms: probabilistic Adaboost, decision tree, and tree of trees of ... of trees, which we call matryoshka. “Nested tree,” “embedded tree” and “recursive tree” are also ...

متن کامل

Decision Trees and Forests: A Probabilistic Perspective

Decision trees and ensembles of decision trees are very popular in machine learning and often achieve state-of-the-art performance on black-box prediction tasks. However, popular variants such as C4.5, CART, boosted trees and random forests lack a probabilistic interpretation since they usually just specify an algorithm for training a model. We take a probabilistic approach where we cast the de...

متن کامل

Universal Aspects of Probabilistic Automata 3

For lack of composability of their morphisms, probability spaces, and hence probabilistic automata, fail to form categories; however, they t into the more general framework of precategories, which are introduced and studied here. In particular, the notion of adjunction and weak adjunction for precategories is presented and justiied in detail. As an immediate beneet, a concept of (weak) product ...

متن کامل

Sparse and Misaligned Data

In the learning and recognition methods we introduced till now, we have not made strong assumptions about the data: Nearest neighbor, SVM, distance metric learning, normalization and decision trees do not explicitly assuming distributional properties of the data; PCA and FLD are optimal solutions under certain data assumptions, but they work well in many other situations too; parametric probabi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008